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Abstract — This paper derives fundamental limits associated 
witli compressive classification of Gaussian mixture source mod- 
els. In particular, we offer an asymptotic characterization of 
the behavior of the (upper bound to the) misclassiflcation 
probability associated with the optimal Maximum-A-Posteriori 
(MAP) classifier that depends on quantities that are dual to 
the concepts of diversity gain and coding gain in multi-antenna 
communications. The diversity, which is shown to determine the 
rate at which the probability of misclassification decays in the low 
noise regime, is shown to depend on the geometry of the source, 
the geometry of the measurement system and their interplay. 
The measurement gain, which represents the counterpart of the 
coding gain, is also shown to depend on geometrical quantities. It 
is argued that the diversity order and the measurement gain also 
offer an optimization criterion to perform dictionary learning for 
compressive classification applications. 

I. Introduction 

Classification of high dimensional signals is fundamental to 
the broad fields of signal processing and machine learning. 
The aim is to increase speed and reliability while reducing 
the complexity of discrimination. An approach that has at- 
tracted a great deal of current interest is Compressed Sensing 
(CS) which seeks to capture important attributes of high- 
dimensional sparse signals from a small set of linear projec- 
tions. The observation HI, 121 that captured the imagination 
of the signal processing community is that it is possible to 
guarantee fideUty of reconstruction from random linear pro- 
jections when the source signal exhibits sparsity with respect 
to some dictionary. 

Within CS the challenge of signal reconstruction has at- 
tracted the greatest attention, but our focus is different. We 
are interested in detection rather than estimation, in problems 
such as hypothesis testing, pattern recognition and anomaly 
detection that can be viewed as instances of signal classifi- 
cation. It is also natural to employ compressive measurement 
here since it may be possible to discriminate between signal 
classes using only partial information about the source signal. 
The challenge now becomes that of designing measurements 
that ignore signal features with little discriminative power In 
fact, we would argue that the compressive nature of CS makes 
the paradigm a better fit to classification than to reconstruction. 

Compressive classification appears in the machine learning 
literature as feature extraction or supervised dimensionality 
reduction. Approaches based on geometrical characterizations 
of the source have been developed, some like linear dis- 
criminant analysis (LDA) and principal component analysis 



(PCA) just depending on second order statistics. Approaches 
based on higher-order statistics of the source have also been 
developed H-IH. 

In this paper we derive fundamental limits on compressive 
classification by drawing on measures of operational rele- 
vance: the probability of misclassification. We assume that 
the source signal is described by a Gaussian Mixture Model 
(GMM) that has been already learned. This assumption is 
motivated in part by image processing where GMMs have 
been used very successfully to describe patches extracted from 
natural images ifTOl . Our main contribution is a characteriza- 
tion of the probability of misclassification as a function of 
the geometry of the individual classes, their interplay and the 
number of measurements. We show that the fundamental limits 
of signal classification are determined by quantities that can 
be interpreted as the duals of quantities that determine the 
fundamental limits of multi-antenna communication systems. 
These quantities include the diversity order and the coding 
gain which characterize the error probability in multiple input 
multiple output (MIMO) systems in the regime of high signal- 
to-noise ratio (SNR) ifTTI . |[T2l . We note that wireless commu- 
nication involves classification rather than reconstruction since 
the aim is to discriminate the transmitted codeword from the 
alternative codewords. 

We use the following notation: boldface upper-case letters 
denote matrices (X), boldface lower-case letters denote col- 
umn vectors (x) and italics denote scalars (x); the context 
defines whether the quantities are deterministic or random. The 
symbol I represents the identity matrix. The operators (•)^, 
rank(-), dct(-) and pdet(-) represent the transpose operator, 
the rank operator, the determinant operator and the pseudo- 
determinant operator, respectively. The symbol log (•) denotes 
the natural logarithm. For reason of space, we relegate the 
mathematical proofs of our results to an upcoming journal 
paper lfT3l . 

II. The Compressive Classification Problem 

We consider a classification problem in the presence of 
compressive and noisy measurements. In particular, we use 
the standard measurement model given by: 



y = *x 
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where y G represents the measurement vector, x G 
represents the source vector, $ e M.^^^'^ represents the 



measurement matrix and n ~ A/^ (O, cr^ • l) e M^^ represents 
standard white Gaussian noise. 

We take the measurement matrix to be such that its ele- 
ments are drawn independently from a zero-mean Gaussian 
distribution with a certain fixed variance, which is common in 
various CS problems |[T], JSJ- We also take the source signal to 
follow the well-known GMM, which has been shown to lead 
to state-of-the-art results in various classification applications 
including hyper-spectral imaging and digit recognition [;8|. 
This model assumes that the source signal is drawn from 
one out of L classes C;, i = with probability 

Pi,i = and that the distribution of the source 

conditioned on C; is Gaussian with mean /Xj S and 
(possibly rank-deficient) covariance matrix S.; G W^^^ . 

The objective is to produce an estimate of the true sig- 
nal class given the measurement vector The Maximum-A- 
Posteriori (MAP) classifier, which minimizes the probability 
of misclassification lfT4l . produces the estimate given by: 
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argiiMxP (Ci I y)= argmaxp (y | d) Pi. (2) 



where P{Ci\y) represents the a posteriori probability of class 
Ci given the measurement vector y and p{y\Ci) represents 
the probability density function of the measurement vector y 
given the class Ci. 

We base the analysis - in line with the standard practice 
in multiple-antenna communications systems ifTTl . lfT2l - on 
an upper bound to the probability of misclassification of the 
MAP classifier Pp^r , rather than the exact probability of 
misclassification Perr- We also base the analysis on two fun- 
damental metrics that characterize the asymptotic performance 
of the upper bound to the probability of misclassification in 
the low noise regime, which is relevant to various emerging 
classification tasks ||8j. In particular, we define the diversity 
order of the measurement model in ([T) as: 
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log Perr (g^) 
log (7^ 



(3) 



that determines how (the upper bound to) the misclassification 
probability decays at low noise levels ifTSl . lfT6l . We also 
define the measurement gain of the measurement model in ([1) 
as: 

gm = iim cr- • -^j^l==, (4) 
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that determines the offset of (the upper bound to) the misclas- 
sification error probability at low noise levels. These quantities 
admit a counterpart in multiple-antenna communications - for 
example, the measurement gain corresponds to the standard 
coding gain. It turns out that the behavior of the upper bound 
to the misclassification probability mimics closely the behavior 
of the exact misclassification probability - as shown in the 
sequel - bearing witness to the value of the approach. 

The characterization of the performance measures in (|3) 
and dUi will be expressed via quantities that relate to the 
geometry of the measurement model, namely, the rank and 
the pseudo-determinant of certain matrices. In particular, we 



define the behavior of (|3]l and (|4|i via the geometry of the 
linear transformation of the source signal effected by the 
measurement "channel", by using the quantities: 

> = rank($Si$-^) and Vi ~ pdct(*Si$^), which 
measure the dimension and volume, respectively, of the 
sub-space spanned by the linear transformation of the 
signals in class Cf, 
. nj = rank($(S,; + and v^j = pdct($(S, + 

which measure the dimension and volume, 
respectively, of the union of sub-spaces spanned by the 
linear transformation of the signals in classes Ci or Cj. 
We also define the behavior of (O and (|4|i via the geometry 
of the original source signal, by using the quantities: 

• = rank(I]j), which relates to the dimension of the 

sub-space spanned by input signals in C^; 
« tj:- = rank(Si + Sj), which relates to the dimension 
of the union of sub-spaces spanned by input signals in 
Ci or Cj. 

We argue that this two-step approach casts further insight 
into the characteristics of the compressive classification prob- 
lem, by allowing us to untangle in a systematic manner the 
effect of the measurement matrix and the effect of the source 
geometry. 

III. The Case OF Two Classes 

We now consider a two-class compressive classification 
problem. The Bhattacharyya bound, which represents a spe- 
cialization of the Chernoff bound ifTTl . leads to an upper bound 
to the probability of misclassification given by lfT4l : 
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^dct (*S,*^ + 0-21) dct (*Sj*^ 

The Bhattacharyya based upper bound to the probability of 
misclassification encapsulated in (|5]i and (|6]l is the basis of 
the ensuing analysis. This analysis treats the case where the 
classes are zero-mean, i.e. fii = = 0, and the case where 
classes are non-zero mean, i.e. fi i or fj,2 ^ 0, separately. 
The zero-mean case exhibits the main operational features 
of the compressive classification problem; the nonzero-mean 
case exhibits occasionally additional operational features, e.g. 
infinite diversity order 

A. Zero-Mean Classes 

The following Theorem offers a view of the asymptotic be- 
havior of the probability of misclassification for the two-class 
compressive classification problem with zero-mean classes, by 
leveraging directly the geometry of the linear transformation 
of the source signal effected by the measurement "channel". 



Theorem 1: Consider the measurement model in ([!} where 
X 7V(0, Si) with probability Pi and x 7V(0, S2) with 
probability P2 = I — Pi. Then, the upper bound to the 
probability of misclassification in ^ behaves as: 

. If = ri2 then. 

Per. ^'^0 (7) 

. If < ri2 then. 
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The following Theorem now describes the asymptotic be- 
havior of the probability of misclassification for the two-class 
compressive classification problem with zero-mean classes, by 
leveraging instead the geometry of the source signals. The 
result uses the fact that N > rs^^ > max (rs^ , fsa) and, 
with probability 1, ri = min (A/, j'Si ), ''2 = min(M, r^^) 
and ri2 = min (A/, rs^^). The result also assumes, without 
loss of generality, that < rs^. 

Theorem 2: Consider the measurement model in ([1} where 
X ~ Af{0, Si) with probability Pi and x 7V(0, S2) with 
probability P2 = 1 — Pi. Then, the upper bound to the 
probability of misclassification in (jSj behaves as; 

If M < < < rs^^, or r^^ < r-s^ < ^Sia < M 
0{l), 



and '^1 _ j,^^^ jjjgjj 



P,. 







(11) 



Otherwise, 
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and when < M < rs^ < rj^^^: 



2 V 2 
when < r^^ < M < rs^^: 
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and when rs^ < rs^ < rs^j < M and ''^i^''^^ 



d= -- 



1 /''Si + r-s^ 



(16) 



It is now instructive to probe further on the characterizations 
embodied in Theorems [1] and |2] to infer the main operational 
features of the two-class compressive classification problem. 

The characterization encapsulated in Theorem [T] admits a 
very simple interpretation: 

> if = ri2, then the sub-spaces spanned by the 
signals in classes 1 and 2 overlap completely - the 
upper bound to the misclassification probability exhibits 
an error floor because it is not possible to distinguish the 
classes perfectly as the noise level approaches zero; 

« if < ri2 then the sub-spaces spanned by the 

signals in classes 1 and 2 do not overlap completely - the 
upper bound to the misclassification error probability (and 
the true error probability) then does not exhibit an error 
floor as it is possible to distinguish the classes perfectly as 
the noise level approaches zero. The lower the degree of 
overlap, the higher the diversity order - this is measured 
via the interplay of the various ranks, ri, r2 and 7-12; 

> the scenario > ri2 is not possible in view of the 
geometry of the two-class problem. 

On the other hand, the characterization encapsulated in 
Theorem 12] offers the means to articulate about the interplay 
between the number of measurements and the source geome- 
try. Of particular importance: 

« if A/ < , the upper bound will exhibit an error floor a 
low noise levels; conversely, if M > and ''^i+'^a ^ 
rsi2 the upper bound will not exhibit such an error floor 
at low noise levels; 

« in addition, if M > rsi2 additional measurements will 
have no impact on diversity order 

Overall, it is possible to argue that the diversity order is a 
function of the difference between the sub-spaces associated 
with the two classes, which is given by by gradually 
increasing the number of measurements from 1 up to r^^^ 
it is possible to extract the highest diversity level equal to 
( fTSl i; however, increasing the number of measurements past 
rsj2 does not offer a higher diversity level - instead, it 
only translates into a higher measurement gain. One then 
understands the role of measurement as a way to probe the 
differences between the classes. 

In contrast, the measurement gain is a function of the 
exact geometry of the classes in the Gaussian mixture model. 
It increases with the ratio of the product of the non-zero 
eigenvalues of ^ (Si + S2) to the product of the singular 
values of ^'Si^'^ and 4>S2*"^. 

We note that there is often flexibility in the definition of the 
properties of signal classes of a GMM, i.e. the dictionary lITOl . 
Measurement gain, and to a certain extent the diversity gain, 
can then provide an optimization criterion for dictionary 
design for compressive classification applications. 

B. Nonzero-mean classes 

The following Theorem generalizes the description of the 
asymptotic behavior of the probability of misclassification 
from the zero-mean to the nonzero-mean two-class compres- 
sive classification problem. 



Theorem 3: Consider the measurement model in ([T) where 
X ^ Af{fJ-i, Si) with probabiUty Pi and x ^ J\f{fi2, '^2) with 
probability P2 = 1 — Pi and /Xj^ 7^ /Xj 7^ 0. If 

iin(*(/Xi-/X2)(Mi-M2r*^) 2 im(*(Si+E2)*^), (17) 

then the upper bound to the probability of misclassification in 
Q decays exponentially with as ^ 0; otherwise, 

P... = (^)" + o((^)"), a^^O (18) 

where a > 1 is a finite constant which depends on the first 
term in (|6|l, and and d are as in Theorems [T] and |2] 

The characterization embodied in Theorem |3]illustrates that 
the asymptotic behavior of the upper bound of the error 
probability for classes with non-zero mean can be different 
from that for classes with zero mean. The differences in 
behavior trace back to the fact that M > rs^^ represents 
a necessary condition for condition (fTTl i to hold. In the non- 
zero mean case, choosing M > rs^^ leads to a diversity order 
d = 00; in contrast, in the zero-mean case choosing M > r^^^ 
does not affect the diversity order Letting M < rs^^ induces 
the same diversity order both for nonzero-mean and zero-mean 
classes. The presence of the nonzero-mean here then impacts 
only the measurement gain. 

One concludes that in the non-zero mean case, increasing 
the number of measurements past rs^^ can have a dramatic 
effect on the performance. Geometrically, this result reflects 
the fact that, when embedded in a higher dimensional space 
(R*^ in our cases), the affine spaces corresponding to the 
classes are separated when 0. 

IV. The Case of Multiple Classes 

We now consider a multiple class compressive classification 
problem, where L > 3. The generalization of the two-class 
results to the multiple-class case is possible by using the union 
bound in conjunction with the two-class Bhattacharyya bound. 

The combination of the union bound with Bhattacharyya 
bound leads immediately to an upper bound to the probability 
of misclassification given by ifTSl : 

L L 

Perr = ^ E /f^e-^(*-'")P, (19) 

4=1 J = l 

and K{i,j) is given by (|6]l. 

The fact that the form of the upper bound in ( fT9] l is akin 
to the form of the upper bound in (|5), involving only in 
addition various pair-wise misclassification terms that capture 
the interaction between the different classes, leads to the 
immediate generalization of the results encapsulated in the 
previous Theorems. 

In particular, we can argue that the upper bound to the 
misclassification probability will exhibit an error floor if 
at least one of the pair-wise misclassification probabilities 
also exhibits an error floor Conversely, the misclassification 
probability will tend to zero as tends to zero if all the 
pairwise misclassification probabilities also tend to zero. 
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Figure L Upper bound to error probability and true error probability vs. 

for different number of measurements M, for the zero-mean two-class 
compressive classification problem, (r^j = 2, = 3, J's^j = 4, /ij^ = 
H2 = and N = 6). 

The diversity order of the multiple-class misclassification 
probability now corresponds to the lowest diversity order 
of the pairwise misclassification probabilities. Similarly, the 
measurement gain of the multiple-class misclassification prob- 
ability corresponds to the measurement gain of the pairwise 
misclassification probability associated associated with the 
lowest diversity order Therefore, it is possible to capitalize 
on the results embodied in Theorems [ri|2] and |3] to understand 
the behavior of the multiple-class compressive classification 
problem immediatly. 

V. Numerical Results 

We now present a series of results that illustrate the main 
operational features of compressive classification of Gaussian 
mixture models. In particular, we consider three experiments 
where the exact covariance matrices and/or mean vectors of 
the classes have been generated randomly: 

> A two-class compressive classification problem with 
zero-mean classes where L = 2, r-^-^ = 2, = 3, 
''S12 = 4, /Xj^ = = and = 6 (see Figure [TJ; 

• A two-class compressive classification problem with 
nonzero-mean classes where L = 2, rs^ =2, = 2, 
''S12 = 2, /.t]^ 7^ 7^ and A^ = 6 (see Figure |2]i; 

• A multi-class classification problem where i = 4, = 
2, = 3, rs, = 3, r-^^ = 2, r^i^ = 4, r^^g 5, 
'"S14 4, rs23 = 4, rs2j ^ 5, rs,^ = 4, /.t, = 0,i = 
1, • • • ,L and A^ = 6 (see Figure O. 

The results confirm the asymptotic behavior of the upper 
bound to the misclassification probability (as well as the true 
misclassification probability) unveiled in Theorems [T] - 13] 

Figure [T] shows that, for M = 1, 2, the upper bound exhibits 
an error floor, as expected. As M increases {M = 3, 4) we can 
observe that i) it becomes possible to drive the upper bound 
to the misclassification probability to zero at low noise levels, 
and therefore perform classification without errors; and ii) the 
diversity gain also increases with M. We can also verify that 
if M > 7'Si2, having more measurements does not increase 
the diversity gain, but yields a higher measurement gain. 
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Figure 2. Upper bound to error probability and true error probability v.v. 

for different number of measurements M, for a non-zero-mean two- 
class compressive classification problem, (rj^^ = 2, rj;^ = 2, r^-^^ = 2, 
^ /i2 7^ and = 6). 
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Figure 3. Upper bound to error probability and true en'or probability vs. 
for different number of measurements M, for a multiple class classification 
problem, (r-^^ = 2, rj;^ = 3, r-^.^ = 3, r-^^ = 2, r^ja = 4, rj^^.^ = 5, 
r-Ei4 = 4, rj;,3 = 4, r^aj = 5, r^^^ = 4, /i^ = 0, i = 1, • ■ ■ , 4 and 
N = 6). 

Figure |2| shows that, when M < rs^^, the upper bound 
exhibits an error floor, as expected. When M > rs^a^ the 
upper bound will decay exponentially to zero (at low noise 
levels) as presented in Theorem |3] 

In Figure |3] we can observe that the upper bound to 
the misclassification probability is, indeed, dominated by the 
behavior of the worst pair of classes (which in such scenario 
corresponds to the pair (C2, C3), yielding a maximum diversity 
of d = 1, for M > 4), by comparing with any other pair- 
wise upper bound, (e.g. the behavior of the pair (Ci, C3) also 
depicted in the figure). 

VI. Conclusion 

This paper studies fundamental limits in compressive clas- 
sification of Gaussian mixture models. In particular, it is 
shown that the asymptotic behavior of (the upper bound to) 
the misclassification probability, which is intimately linked 
to the geometrical properties of the source and the mea- 
surement system, also captures well the behavior of the true 
misclassification probability. Moreover, it is recognized that 
the key quantities that determine the asymptotic behavior of 
the misclassification probability are akin to standard quantities 
used to characterize the behavior of the error probability in 



multiple-antenna communications: diversity and coding gain. 

The practical relevance of the results - and beyond the 
theoretical insight - relates to the possibility of integrating the 
asymptotic characterizations with dictionary learning methods 
for compressive classification. The diversity order and the 
measurement gain - in view of its links to the geometry of the 
measurement system and the geometry of the source - offer a 
means to pose optimization problem that offer an opportunity 
to construct dictionaries with good discriminative power. 
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